Limitations of Cross-Lingual Learning from Image Search

نویسندگان

  • Mareike Hartmann
  • Anders Søgaard
چکیده

Cross-lingual representation learning is an important step in making NLP scale to all the world’s languages. Recent work on bilingual lexicon induction suggests that it is possible to learn cross-lingual representations of words based on similarities between images associated with these words. However, that work focused on the translation of selected nouns only. In our work, we investigate whether the meaning of other parts-of-speech, in particular adjectives and verbs, can be learned in the same way. We also experiment with combining the representations learned from visual data with embeddings learned from textual data. Our experiments across five language pairs indicate that previous work does not scale to the problem of learning cross-lingual representations beyond simple nouns.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multilingual Plagiarism Detection

Cross lingual plagiarism detection has recently caught attention due to copy-right violations occurring in many fields such as education, journalism, scientific research, literature, screenplays, etc, where an author would translate an article in language L1 into language L2 and then either publish/submit it or change some of the sentences to suit his/her motivations. Therefore, the need for a ...

متن کامل

Cross-Lingual Image Search on the Web

Most people locate images on the Web by querying image search engines such as Google’s. The images are tagged by the words in their “vicinity”, which limits the ability of a searcher to retrieve them. Although images are universal, an English searcher will fail to find images tagged in Chinese, and a Spanish searcher will fail to find images tagged in English. Cross-lingual homonyms cause probl...

متن کامل

Semi-Supervised Representation Learning for Cross-Lingual Text Classification

Cross-lingual adaptation aims to learn a prediction model in a label-scarce target language by exploiting labeled data from a labelrich source language. An effective crosslingual adaptation system can substantially reduce the manual annotation effort required in many natural language processing tasks. In this paper, we propose a new cross-lingual adaptation approach for document classification ...

متن کامل

Image-Mediated Learning for Zero-Shot Cross-Lingual Document Retrieval

We propose an image-mediated learning approach for cross-lingual document retrieval where no or only a few parallel corpora are available. Using the images in image-text documents of each language as the hub, we derive a common semantic subspace bridging two languages by means of generalized canonical correlation analysis. For the purpose of evaluation, we create and release a new document data...

متن کامل

Learning Cross-lingual Word Embeddings via Matrix Co-factorization

A joint-space model for cross-lingual distributed representations generalizes language-invariant semantic features. In this paper, we present a matrix cofactorization framework for learning cross-lingual word embeddings. We explicitly define monolingual training objectives in the form of matrix decomposition, and induce cross-lingual constraints for simultaneously factorizing monolingual matric...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1709.05914  شماره 

صفحات  -

تاریخ انتشار 2017